Fast Asynchronous Anti-TrustRank for Web Spam Detection
نویسندگان
چکیده
Web spam detection is an important problem in Web search. Since Web spam pages tend to have a lot of spurious links, many Web spam detection algorithms exploit the hyperlink structure between the Web pages to detect the spam pages. Anti-TrustRank algorithm is a well-known link-based spam detection algorithm which follows the principle that spam pages are likely to be referenced by other spam pages. Since a real-world Web graph involves tens of billions of nodes, it is crucial to develop work-efficient Web spam detection algorithms. In this paper, we develop asynchronous Anti-TrustRank algorithms which allow us to significantly reduce the number of arithmetic operations compared to the traditional synchronous Anti-TrustRank algorithm without degrading the performance in detecting Web spams. We theoretically prove the convergence of the asynchronous Anti-TrustRank algorithms, and conduct experiments on a real-world Web graph indexed by NAVER which is the most popular search engine in Korea. ACM Reference Format: Joyce Jiyoung Whang, Yeon Seong Jeong, Inderjit S. Dhillon, Seonggoo Kang, and Jungmin Lee. 2018. Fast Asynchronous Anti-TrustRank for Web Spam Detection. In Proceedings of WSDM workshop on Misinformation and Misbehavior Mining on the Web (MIS2). ACM, New York, NY, USA, 4 pages. https://doi.org/10.475/123_4
منابع مشابه
Link-Based Characterization and Detection of Web Spam
We perform a statistical analysis of a large collection of Web pages, focusing on spam detection. We study several metrics such as degree correlations, number of neighbors, rank propagation through links, TrustRank and others to build several automatic web spam classifiers. This paper presents a study of the performance of each of these classifiers alone, as well as their combined performance. ...
متن کاملLink-Based Spam Algorithms in Adversarial Information Retrieval
Web spam has become one of the most exciting challenges and threats to Web search engines. The relationship between the search systems and those who try to manipulate them came up with the field of adversarial information retrieval. In this paper, we have set up several experiments to compare HostRank and TrustRank to show how effective it is for TrustRank to combat Web spam and we have also re...
متن کاملSIGIR 2006 Workshop on Adversarial Information Retrieval on the Web AIRWeb 2006
We perform a statistical analysis of a large collection of Web pages, focusing on spam detection. We study several metrics such as degree correlations, number of neighbors, rank propagation through links, TrustRank and others to build several automatic web spam classifiers. This paper presents a study of the performance of each of these classifiers alone, as well as their combined performance. ...
متن کاملAnti-Trust Rank for Detection of Web Spam and Seed Set Expansion
In the recent times, the Web has been the most popular and perhaps the most efficient platform for sharing, storing as well as retrieving information. Finding the required information from the Web is facilitated by search engines. Search engines form the interface between the Web and the users. Given the vast amount of information available on the Web, search engines must pick a small subset of...
متن کاملWeb Spam Detection with Anti-Trust Rank
Spam pages on the web use various techniques to artificially achieve high rankings in search engine results. Human experts can do a good job of identifying spam pages and pages whose information is of dubious quality, but it is practically infeasible to use human effort for a large number of pages. Similar to the approach in [1], we propose a method of selecting a seed set of pages to be evalua...
متن کامل